Search CORE

15 research outputs found

Autoencoders for natural language semantics

Author: Bosc Tom
Publication venue
Publication date: 01/09/2022
Field of study

Les auto-encodeurs sont des réseaux de neurones artificiels qui apprennent des représentations. Dans un auto-encodeur, l’encodeur transforme une entrée en une représentation, et le décodeur essaie de prédire l’entrée à partir de la représentation. Cette thèse compile trois applications de ces modèles au traitement automatique des langues : pour l’apprentissage de représentations de mots et de phrases, ainsi que pour mieux comprendre la compositionnalité. Dans le premier article, nous montrons que nous pouvons auto-encoder des définitions de dictionnaire et ainsi apprendre des vecteurs de définition. Nous proposons une nouvelle pénalité qui nous permet d’utiliser ces vecteurs comme entrées à l’encodeur lui-même, mais aussi de les mélanger des vecteurs distributionnels pré-entraînés. Ces vecteurs de définition capturent mieux la similarité sémantique que les méthodes distributionnelles telles que word2vec. De plus, l’encodeur généralise à un certain degré à des définitions qu’il n’a pas vues pendant l’entraînement. Dans le deuxième article, nous analysons les représentations apprises par les auto-encodeurs variationnels séquence-à-séquence. Nous constatons que les encodeurs ont tendance à mémo- riser les premiers mots et la longueur de la phrase d’entrée. Cela limite considérablement leur utilité en tant que modèles génératifs contrôlables. Nous analysons aussi des variantes architecturales plus simples qui ne tiennent pas compte de l’ordre des mots, ainsi que des mé- thodes basées sur le pré-entraînement. Les représentations qu’elles apprennent ont tendance à encoder plus nettement des caractéristiques globales telles que le sujet et le sentiment, et cela se voit dans les reconstructions qu’ils produisent. Dans le troisième article, nous utilisons des simulations d’émergence du langage pour étudier la compositionnalité. Un locuteur – l’encodeur – observe une entrée et produit un message. Un auditeur – le décodeur – tente de reconstituer ce dont le locuteur a parlé dans son message. Nous émettons l’hypothèse que faire des phrases impliquant plusieurs entités, telles que « Jean aime Marie », nécessite fondamentalement de percevoir chaque entité comme un tout. Nous dotons certains agents de cette capacité grâce à un mechanisme d’attention, alors que d’autres en sont privés. Nous proposons différentes métriques qui mesurent à quel point les langues des agents sont naturelles en termes de structure d’argument, et si elles sont davantage analytiques ou synthétiques. Les agents percevant les entités comme des touts échangent des messages plus naturels que les autres agents.Autoencoders are artificial neural networks that learn representations. In an autoencoder, the encoder transforms an input into a representation, and the decoder tries to recover the input from the representation. This thesis compiles three different applications of these models to natural language processing: for learning word and sentence representations, as well as to better understand compositionality. In the first paper, we show that we can autoencode dictionary definitions to learn word vectors, called definition embeddings. We propose a new penalty that allows us to use these definition embeddings as inputs to the encoder itself, but also to blend them with pretrained distributional vectors. The definition embeddings capture semantic similarity better than distributional methods such as word2vec. Moreover, the encoder somewhat generalizes to definitions unseen during training. In the second paper, we analyze the representations learned by sequence-to-sequence variational autoencoders. We find that the encoders tend to memorize the first few words and the length of the input sentence. This limits drastically their usefulness as controllable generative models. We also analyze simpler architectural variants that are agnostic to word order, as well as pretraining-based methods. The representations that they learn tend to encode global features such as topic and sentiment more markedly, and this shows in the reconstructions they produce. In the third paper, we use language emergence simulations to study compositionality. A speaker – the encoder – observes an input and produces a message about it. A listener – the decoder – tries to reconstruct what the speaker talked about in its message. We hypothesize that producing sentences involving several entities, such as “John loves Mary”, fundamentally requires to perceive each entity, John and Mary, as distinct wholes. We endow some agents with this ability via an attention mechanism, and deprive others of it. We propose various metrics to measure whether the languages are natural in terms of their argument structure, and whether the languages are more analytic or synthetic. Agents perceiving entities as distinct wholes exchange more natural messages than other agents

Dépôt Institutionnel Numérique

DART: a Dataset of Arguments and their Relations on Twitter

Author: Bosc Tom
Cabrio Elena
Villata Serena
Publication venue: HAL CCSD
Publication date: 23/05/2016
Field of study

International audienceThe problem of understanding the stream of messages exchanged on social media such as Facebook and Twitter is becoming a major challenge for automated systems. The tremendous amount of data exchanged on these platforms as well as the specific form of language adopted by social media users constitute a new challenging context for existing argument mining techniques. In this paper, we describe a resource of natural language arguments called DART (Dataset of Arguments and their Relations on Twitter) where the complete argument mining pipeline over Twitter messages is considered: (i) we identify which tweets can be considered as arguments and which cannot, and (ii) we identify what is the relation, i.e., support or attack, linking such tweets to each other

INRIA a CCSD electronic archive server

Learning GFlowNets from partial episodes for improved convergence and stability

Author: Bengio Emmanuel
Bengio Yoshua
Bosc Tom
Jain Moksh
Korablyov Maksym
Madan Kanika
Malkin Nikolay
Nica Andrei
Rector-Brooks Jarrid
Publication venue
Publication date: 03/06/2023
Field of study

Generative flow networks (GFlowNets) are a family of algorithms for training a sequential sampler of discrete objects under an unnormalized target density and have been successfully used for various probabilistic modeling tasks. Existing training objectives for GFlowNets are either local to states or transitions, or propagate a reward signal over an entire sampling trajectory. We argue that these alternatives represent opposite ends of a gradient bias-variance tradeoff and propose a way to exploit this tradeoff to mitigate its harmful effects. Inspired by the TD(

\lambda

) algorithm in reinforcement learning, we introduce subtrajectory balance or SubTB(

\lambda

), a GFlowNet training objective that can learn from partial action subsequences of varying lengths. We show that SubTB(

\lambda

) accelerates sampler convergence in previously studied and new environments and enables training GFlowNets in environments with longer action sequences and sparser reward landscapes than what was possible before. We also perform a comparative analysis of stochastic gradient dynamics, shedding light on the bias-variance tradeoff in GFlowNet training and the advantages of subtrajectory balance.Comment: ICML 202

arXiv.org e-Print Archive

Argument Mining:A Survey

Author: Abbott Rob
Adam Wyner
Ailomaa Marita
Anand Pranav
Aristotle
Aristotle
Athar Awais
Bex Floris
Bex Floris
Bosc Tom
Budzynska Katarzyna
Budzynska Katarzyna
Bunt Harry
Cabrio Elena
Carletta Jean
Carstens Lucas
Chris Reed
Cialdini Robert B.
Das Sanjiv
Delmonte Roldolfo
Dubremetz Marie
Duthie Rory
Egawa Ryo
Fahnestock J.
Feng Vanessa Wei
Feng Vanessa Wei
Gawryjolek Jakub
Grennan Wayne
Groarke Leo
Grosse Kathrin
Grosz Barbara J.
Hamblin C. L.
Hidey Christopher
Hirschberg Julia
Hobbs Jerry R
Hoeken Hans
Hua Xinyu
Ide Nancy
Janier Mathilde
John Lawrence
Kienpointner Manfred
Krauthoff Tobias
Krauwer Steven
Lasnik Howard
Lawrence John
Lawrence John
Lawrence John
Lawrence John
Levy Ran
Liu Bing
Madnani Nitin
Mann William C.
Metzinger Thomas
Musi Elena
Pallotta Vincenzo
Pang Bo
Park Joonsuk
Peldszus Andreas
Perelman Chaïm
Piao Scott
Pollock John
Pollock John L
Reed Chris
Rienks Rutger
Robertson David
Snaith Mark
Stab Christian
Stede Manfred
Toulmin Stephen E
van Eemeren Frans H.
van Rijsbergen Cornelis Joost
Villalba Maria Paz G.
Visser Jacky
Wachsmuth Henning
Walker Marilyn A.
Walton Douglas
Walton Douglas
Publication venue: 'MIT Press - Journals'
Publication date: 01/01/2020
Field of study

Crossref

University of Dundee Online Publications

“And Unto Dust Shalt Thou Return”:Death and the Semiotics of Remembrance in an Ethiopian Orthodox Christian Village

Author: Aymro W.
Bloch M.
Bloch M.
Bosc-Tiessé C.
Brown P.
Bynum C.W
Cerulli E.
Cole J.
Duffy E.
Engelke M.
Engelke M.
Ephraim I.
Hoben A.
Kaplan S.
Kaplan S.
Kaplan S.
Keane W.
Latour B.
Leslau W.
Levine D.N.
Merawi T.
Mersha A.
Messing S.
Morton A.L
Pankhurst A.
Pankhurst R.
Pankhurst R.
Peirce C.S.
Reminick R.A.
Solomon D.
Tom Boylston
Publication venue: 'Informa UK Limited'
Publication date: 16/02/2016
Field of study

Crossref

Edinburgh Research Explorer

Anti-tumour necrosis factor discontinuation in inflammatory bowel disease patients in remission: study protocol of a prospective, multicentre, randomized clinical trial

Author: Abad F.
Ag?ero Tejado E.
Aguas Peris M.
Alba C.
Albert M.
Alem?n H.
Algaba A.
Alonso Abreu I.
Amador M. P.
Amat M.
Angueira T.
Arajol C.
Arias-Gonz?lez L.
Arrondo Velasco A.
Bald?n M.
Bard?n Garc?a B.
Bargall? Garc?a A.
Barreiro de Acosta M.
Barreiro de Acosta Manuel
Barrio Andr?s J.
Bast?n Rey Iria
Bastida Paz G.
Batista L.
Bellver Mart?nez M.
Beltr?n Nicl?s B.
Ben?tez J. M.
Ber Nieto Y.
Bermejo F.
Bernardo D.
Bl?zquez G?mez I.
Bouhmidi Assakali A.
Busquets Casals D.
Cabriada Nu?o J. L.
Calvet Calvo X.
Calvo Hern?ndez M. V.
Calvo M.
Camps B.
Carbajo A. Y.
Cardona Peitx G.
Caro-Pat?n T.
Carri?n Bolorino S.
Carrillo Palau M.
Casanova M. J.
Casellas Vald? J. A.
Casta?o Garc?a A.
Castro Senosiain B.
Ceballos D.
Cerrillo E.
Chac?n Mart?nez S.
Chaparro M.
Consuelo Ca?ete Pizarro F.
de Castro Parga M. L.
de Francisco Garc?a R.
de la Cruz Ram?rez M. D.
de Miguel M.
del Hoyo Francisco J.
Delgado Guillena P.
Desongles Corrales T.
Dom?nech E.
Donday M. G.
Echarri Piudo A.
Espino Paisan E.
Espona Quer M.
Esteve M.
Fern?ndez Forcelledo J. L.
Fern?ndez Pordomingo A.
Fern?ndez-Tom? S.
Ferreiro Iglesias R.
Ferrer Bradley I.
Ferrer A.
Figueroa A.
G?mez Delgado E.
G?mez Irwin L.
G?mez Pastrana B.
Gallach Montero M.
Garc?a Garc?a M. J.
Garc?a Garc?a-Lezc?n C.
Garc?a Iglesias P.
Garc?a Ram?rez L.
Garc?a-Bosh O.
Garc?a-S?nchez V.
Garre A.
Gim?nez Poder?s T.
Gisbert J. P.
Gonz?lez Lama Y.
Gracia Garc?a ?
Gracia Garc?a B.
Guardiola J.
Guerra E.
Guerra I.
Guillot V.
Gustmancher Saiz S.
Guti?rrez Casbas A.
Hern?ndez Muniesa B.
Hern?ndez Ram?rez V.
Hernando Verdugo M. M.
Hernanz Chaves R.
Herrera Justiniano J. M.
Hinojosa del Val J.
Ib??ez Feijoo S.
Iborra Colomino M.
Iglesias Flores E.
Izquierdo Garc?a E.
Jim?nez Garc?a N.
L?pez de Torre Querejazu A.
L?pez S?nchez P.
Leo Carnerero E.
Loizaga D?az I.
Lucendo A. J.
Luis Parras J.
M?ndez-Castrill?n Rodr?guez J.
M?nguez M.
M?rquez Gal?n J. L.
M?rquez Mosquera L.
Ma?osa M.
Maia Bosc? M.
Mar?n Pedrosa S.
Mar?n A.
Mar?n-Jim?nez I.
Marinero ?
Mart?n Arranz E.
Mart?n Arranz M. D.
Mart?nez Cadilla J.
Mart?nez S?nchez B.
Mart?nez Sesmero J. M.
Mart?nez C.
Matallana V.
Mateos Hern?ndez M. I.
McNicholl A. G.
Mejuto Fern?ndez R.
Melcarne L.
Mench?n L.
Merino Ochoa O.
Molas Ferrer G.
Montoro Huguet M.
Montserrat Torres A.
Mora F.
Moraleja Yudego I.
Morales Alvarado V. J.
Morales Mart?nez L.
Morell A.
Motos Garc?a C.
Mu?oz Alonso F.
Mu?oz Villafranca M. C.
Mu?oz J. E.
Mur A.
N??ez Alonso A.
N??ez Ortiz A.
Nantes ?
Navarro P.
Navarro- Llavat M.
Nos Mateu P.
Nos P.
Olivares D.
Ollero Pena V.
on behalf of the Exit study group of Geteccu
Orobitg J.
Ortega L.
Ortiz de Z?rate J.
P?rez Mart?nez I.
Pallar?s Manrique H.
Pan?s J.
Paradela Carreiro A.
Peral Ballester L.
Pereira Bueno S.
Pi?ero P?rez C.
Pineda Mari?o J. R.
Planas Giner A.
Plaza Santos M. R.
Ponferrada D?az ?
Poza Card?n J.
Prieto Vicente V.
Puchades L.
Ramos L?pez L.
Redondo S.
Riestra Men?ndez S.
Rivero Tirado M.
Rodr?guez Guti?rrez C.
Rodr?guez Lago I.
Rodr?guez E.
Romero Izquierdo S.
Rubio Iturria S.
Ruiz Antor?n M. B.
Ruiz A.
S?nchez G?mez E.
S?nchez Ulayar A.
S?nchez C.
Salazar L. F.
Sampedro Gonz?lez M. J.
Sangrador C.
Serra K.
Spicakova K.
Su?rez Ferrer C.
Talavera Fabuel A.
Taxonera C.
Tordera M.
Torrella Cort?s E.
Tosca J.
Trigo Salado C.
Uriarte Estefan?a F.
V?zquez Mor?n J. M.
Van Domselaar M.
Ventura L?pez P.
Vera M.
Vicu?a Arregui M.
Villoria Ferrer A.
Virg?s Aller T.
Y??ez Feria D.
Publication venue
Publication date: 01/01/2019
Field of study

Background: Patients with inflammatory bowel disease who achieve remission with anti-tumour necrosis factor (anti-TNF) drugs may have treatment withdrawn due to safety concerns and cost considerations, but there is a lack of prospective, controlled data investigating this strategy. The primary study aim is to compare the rates of clinical remission at 1?year in patients who discontinue anti-TNF treatment versus those who continue treatment. Methods: This is an ongoing, prospective, double-blind, multicentre, randomized, placebo-controlled study in patients with Crohn?s disease or ulcerative colitis who have achieved clinical remission for ?6?months with an anti-TNF treatment and an immunosuppressant. Patients are being randomized 1:1 to discontinue anti-TNF therapy or continue therapy. Randomization stratifies patients by the type of inflammatory bowel disease and drug (infliximab versus adalimumab) at study inclusion. The primary endpoint of the study is sustained clinical remission at 1?year. Other endpoints include endoscopic and radiological activity, patient-reported outcomes (quality of life, work productivity), safety and predictive factors for relapse. The required sample size is 194 patients. In addition to the main analysis (discontinuation versus continuation), subanalyses will include stratification by type of inflammatory bowel disease, phenotype and previous treatment. Biological samples will be obtained to identify factors predictive of relapse after treatment withdrawal. Results: Enrolment began in 2016, and the study is expected to end in 2020. Conclusions: This study will contribute prospective, controlled data on outcomes and predictors of relapse in patients with inflammatory bowel disease after withdrawal of anti-TNF agents following achievement of clinical remission. Clinical trial reference number: EudraCT 2015-001410-1

RUNA - Repositorio de Saúde

Tweeties Squabbling: Positive and Negative Results in Applying Argument Mining on Social Media

Author: Bosc Tom
Cabrio Elena
Villata Serena
Publication venue: HAL CCSD
Publication date: 01/01/2016
Field of study

International audienc

Tweeties Squabbling: Positive and Negative Results in Applying Argument Mining on Social Media

Author: Bosc Tom
Cabrio Elena
Villata Serena
Publication venue: HAL CCSD
Publication date: 01/01/2016
Field of study

International audienc

INRIA a CCSD electronic archive server